Gaussian (Normal) Distribution

What is the Gaussian Distribution?

The Gaussian distribution, also known as the normal distribution, is a continuous probability distribution that is symmetric about its mean. It describes how values of a variable are distributed in many natural phenomena such as heights, test scores, and measurement errors.

The probability density function (PDF) of the normal distribution is given by:

$$ f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x - \mu)^2}{2\sigma^2}} $$

$\mu$ is the mean (center of the distribution).
$\sigma$ is the standard deviation (spread of the distribution).

Gaussian vs Normal Distribution

There is no difference between a Gaussian distribution and a normal distribution. Both terms refer to the same concept. The term "Gaussian" comes from the mathematician Carl Friedrich Gauss, who studied this distribution in depth. "Normal" is a more general term used in statistics to describe its common appearance in natural data.

In summary, they are two names for the same bell-shaped distribution:

"Gaussian" is more common in mathematics and physics.
"Normal" is more common in statistics and data science.

Choosing the Number of Intervals (Bins)

To choose how many intervals (bins) to use and their width when grouping continuous data from a normal distribution, several statistical rules are commonly used.

✅ 1. Sturges’ Rule

Useful for approximately normal distributions and medium-sized datasets.

Number of intervals:

\[ k = 1 + \log_2(n) \]

Width of each interval:

\[ h = \frac{\max(x) - \min(x)}{k} \]

Best for sample sizes $ n < 2000 $
Data roughly normal
Clear, simple binning

✅ 2. Square-root Rule

Number of intervals:

\[ k = \sqrt{n} \]

Width:

\[ h = \frac{\max(x) - \min(x)}{k} \]

Very simple and fast
No distribution assumptions
Good for small or medium datasets

✅ 3. Scott’s Rule

Scott’s rule minimizes estimation error for normal distributions.

Interval width:

\[ h = \frac{3.5 \,\sigma}{n^{1/3}} \]

Number of intervals:

\[ k = \frac{\max(x)-\min(x)}{h} \]

Excellent for truly normal data
Works well with large samples
More statistically grounded

✅ 4. Freedman–Diaconis Rule

Uses IQR instead of standard deviation, making it robust to outliers.

Interval width:

\[ h = \frac{2 \cdot IQR}{n^{1/3}} \]

Number of intervals:

\[ k = \frac{\max(x) - \min(x)}{h} \]

Best when data has outliers
Useful for skewed distributions
More robust than Scott or Sturges

🔍 Which Rule Should You Use?

Your Situation	Best Method
Data is normal + medium/large sample	Scott
Data is normal + small/medium sample	Sturges
Simple/quick binning	Square root
Outliers or heavy tails	Freedman–Diaconis

📌 Practical Example (Python)


        import numpy as np
        
        x = np.array(values)
        n = len(x)
        sigma = np.std(x, ddof=1)
        
        h = 3.5 * sigma / (n ** (1/3))
        k = int(np.ceil((x.max() - x.min()) / h))
        
        print("Number of bins =", k)
        print("Width =", h)

📚 Summary

Number of intervals:

\[ k = \begin{cases} 1 + \log_2(n) & \text{Sturges} \\ \sqrt{n} & \text{Square root} \\ \frac{\max(x)-\min(x)}{3.5\sigma n^{-1/3}} & \text{Scott} \\ \frac{\max(x)-\min(x)}{2\,IQR\,n^{-1/3}} & \text{F-D} \end{cases} \]

Interval width:

\[ h = \frac{\max(x)-\min(x)}{k} \]

If you want, I can calculate the bins for your dataset or generate a helper function.